-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix GPU Softmax NaN propagation for mixed infinite inputs #33498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix GPU Softmax NaN propagation for mixed infinite inputs #33498
Conversation
| // Handle IEEE-754 case when max_value is INF | ||
| if (isinf((float)max_value)) { | ||
| for (cls = 0; cls < class_num; ++cls) { | ||
| ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH]; | ||
| if (v == max_value) | ||
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN; | ||
| else | ||
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f; | ||
| } | ||
|
|
||
| // Write results and exit | ||
| for (cls = 0; cls < class_num; ++cls) { | ||
| #if INPUT0_SIMPLE == 1 | ||
| const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH; | ||
| #else | ||
| #if INPUT0_DIMS == 5 | ||
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset); | ||
| #else | ||
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset); | ||
| #endif | ||
| #endif | ||
| output[output_idx] = data[cls * TMP_CLASS_PITCH]; | ||
| } | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to apply fused ops for the issued case too.
Please check my suggestion below.
| // Handle IEEE-754 case when max_value is INF | |
| if (isinf((float)max_value)) { | |
| for (cls = 0; cls < class_num; ++cls) { | |
| ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH]; | |
| if (v == max_value) | |
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN; | |
| else | |
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f; | |
| } | |
| // Write results and exit | |
| for (cls = 0; cls < class_num; ++cls) { | |
| #if INPUT0_SIMPLE == 1 | |
| const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH; | |
| #else | |
| #if INPUT0_DIMS == 5 | |
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset); | |
| #else | |
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset); | |
| #endif | |
| #endif | |
| output[output_idx] = data[cls * TMP_CLASS_PITCH]; | |
| } | |
| return; | |
| } | |
| for (cls = 0; cls < class_num; ++cls) { | |
| // Handle IEEE-754 case when max_value is INF | |
| if (isinf((float)max_value)) { | |
| if (data[cls*TMP_CLASS_PITCH] == max_value) | |
| data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(NAN); | |
| else | |
| data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(0.0f); | |
| } else { | |
| ACCUMULATOR_TYPE t = native_exp(data[cls*TMP_CLASS_PITCH] - max_value); | |
| denominator += t; | |
| data[cls*TMP_CLASS_PITCH] = t; | |
| } | |
| } | |
| .... | |
| for (cls = 0; cls < class_num; ++cls) { | |
| ACCUMULATOR_TYPE res = data[cls*TMP_CLASS_PITCH]; | |
| if (!isinf((float)max_value)) { | |
| res = res / denominator; | |
| } |
|
Thanks for the suggestion but I’ve already reworked with the INF handling so it’s processed inside the main computation loops. This ensures fused ops and activation are now applied consistently for both INF and non-INF paths. I also verified the original issue cases and the mixed-INF edge cases against the CPU implementation. |
e-ddykim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, could you please add unit tests for the issue case?
|
’ve added comprehensive GPU unit tests covering all reported IEEE-754 edge cases (mixed INF, multiple INF, negative INF, and NaN propagation). |
|
build_jenkins |
src/tests/functional/plugin/shared/include/single_op_tests/softmax.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
…max.cpp Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>
|
applied suggestion kindly review |
|
waiting for pr approval kindly review |
praasz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, generic test part.
@e-ddykim , @Lyamin-Roman could you review
🐞 Bug Fix: Correct NaN Propagation in GPU Softmax with Mixed Infinite Inputs
This PR fixes a numerical stability bug in the GPU implementation of the Softmax operation where inputs containing a mix of
infand finite values produce incorrect results.🔬 Problem Summary
For an input such as:
[inf, 1.0, 2.0]
The mathematically correct Softmax (IEEE-754 compliant) behavior is:
However, the current GPU kernel produced:
[nan, nan, nan]
while the CPU implementation already behaves correctly.
The root cause is that when
max_value == inf, the kernel evaluates:exp(inf - inf) → NaN
which contaminates the denominator and propagates NaNs to all output positions.
🛠️ Solution
The GPU Softmax kernel now explicitly detects the
max_value == infcase and applies IEEE-754 semantics:max_value→NaN0.0This avoids unstable exponential evaluation and ensures GPU output exactly matches CPU behavior.
The fix is minimal, local to the kernel, and does not affect performance for normal inputs.
📈 Impact
[inf, 1, 2][nan, 0, 0][nan, nan, nan][nan, 0, 0][inf, -inf, 1][nan, 0, 0][nan, nan, nan][nan, 0, 0][-inf, 1, 2][0, .2689, .7311][0, .2689, .7311]🧩 Files Modified
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/softmax_gpu_ref.cl
🧪 Testing
Verified against the reproducer from issue #33456.
GPU output now matches CPU output for all reported edge cases.
🔗 Related Issue
Fixes #33456